10 May, 2021

Introduction

Covid-19 and goals for the project

Materials and methods

Datasets and workflow

Data sets used in this project:

  • Covid-19 data from the github repository of John Hopkins University
  • Demographic and social factors on a country basis from gapminder and worldbank
  • Latitude and longitude data for countries is from the “maps” package dataset

Materials and methods

Cleaning & Augmenting

Cleaning: Issues with dataset:

  • Timeseries data in very wide format
  • Country names were not consistent over data sources
  • Multiple files had to be combined
  • Recovered are reported inconsistently

Augmentation:

  • Calculate cases, deaths, and recoveries per 100K citizens
  • Additional augmentations with rolling means and new cases per day
  • For shiny app: Join latitude and longitude data to the country level Covid data.

Materials and methods

Data exploration and models

Initial exploratory data analysis (eda)

Modelling:

  • PCA analysis: any trends, clusters, outliers in cases via countries and dependent variables?
  • 2 linear regression models (\(y=\beta_0 + \beta_1 x_1+ \beta_2 x_2 + \epsilon\)):
    • \(x_1\) = Population % above 65, \(x_2\) = Urban pop %. Grouped by income level
    • \(x_1\) = GDP, \(x_2\) = Population density. Grouped by region

Exploring Covid “waves” and case fatality

Results - Exploratory Data Analysis

Number of cases per 100 thousand in each region

Results - Exploratory Data Analysis

Visualise correlation between income groups

Results - PCA

PCA done on continous socio-economic features

  • PC 1 explains differences in cases fairly well
  • Some countries are outliers in this projection

Results - Linear regression model

Deaths as Function of Population % above 65 years and Population % living in urban areas.

Results - Linear regression model

Significant slopes

  • population % above 65 years with high, upper middle and lower middle income
  • population % in urban areas with upper middle income.

Results - Identifying covid waves

What is a good criteria for a wave?

  • We found that a weekly increase 10% in deaths is a good identifier

Results - Identifying covid waves

Do Covid-19 waves appear to be synchronized in different countries?

Discussion

In general from eda and linear regression, data suggests that more developed countries are hit “harder” by Covid-19.

  • This could be due to less developed/lower income countries having a younger avg. age (higher general mortality rates), thus younger populations are less severely influenced by infection.
  • However, there may be data quality issues and under reporting between countries.

Things to note

  • Correlations do not infer causation.
  • Linear model is not dealing with collinearity of features.

Discussion

We see that Covid “waves” are not synchronous between global regions.

  • Some regions had very visually distinct peaks of waves over time (Europe & Central Asia), while other were less distinct (Asia & Pacific)
  • The region Europe & Central Asia had the highest percentage of countries in waves at a certain time. At the highest peak almost 70% of the countries in the region were in a wave.
photo from:https://constructionexec.com/article/how-a-second-covid-19-wave-is-changing-construction-manufacturing"

Covid-19 overview Shiny App

Questions